# High-precision Action Recognition
Xclip Large Patch14 16 Frames
MIT
X-CLIP is an extension of CLIP for general video-language understanding, achieving video classification and video-text retrieval tasks through contrastive learning.
Text-to-Video
Transformers English

X
microsoft
678
3
Xclip Large Patch14
MIT
X-CLIP is an extension of CLIP for general video-language understanding, trained via contrastive learning on (video, text) pairs.
Text-to-Video
Transformers English

X
microsoft
1,698
11
Xclip Base Patch32 16 Frames
MIT
X-CLIP is an extended version of CLIP for general video-language understanding, trained on video-text pairs via contrastive learning, suitable for tasks like video classification and video-text retrieval.
Text-to-Video
Transformers English

X
microsoft
901
4
Featured Recommended AI Models